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Abstract The multiple Try Metropolis (MTM) algorithm is an advanced 
MCMC technique based on drawing and testing several candidates at each 
iteration of the algorithm. One of them is selected according to certain weights 
and then it is tested according to a suitable acceptance probability. Clearly, 
since the computational cost increases as the employed number of tries grows, 
one expects that the performance of an MTM scheme improves as the number 
of tries increases, as well. However, there are scenarios where the increase 
of number of tries does not produce a corresponding enhancement of the 
performance. In this work, we describe these scenarios and then we introduce 
possible solutions for solving these issues. 

Keywords Multiple Try Metropolis algorithm; Multi-point Metropolis 
algorithm; MCMC methods; MTM with variable number of tries. 


1 Introduction 


Markov chain Monte Carlo (MCMC) methods are classical Monte Carlo 
techniques (Robert and Casella 20041, that produce a Markov chain 

density function (pdf), usually to 

[ 2 ^ . 


converging to a target probability 
approximate an otherwise-incalculable integral (Liu 2004[ [Liang et ah 


The Multiple-Try Metropolis (MTM) method (Liu et al.[ 2000 


extension of the Metropolis-Hastings algorithm (Metropolis et al. 


IS an 


1953 


Hastings, [1970 1 in which the next state of the chain is selected among a set 


of N independent and identically distributed (i.i.d.) samples. This enables 
the MTM sampler to make large step-size jumps without a lowering in the 
acceptance rate; and thus MTM can explore easily a larger portion of the 
sample space in fewer iterations. Different MTM schemes have been proposed 
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in literature (Frenkel and Smi 

1996 Chapter 13), ( 

Qin and Liu, 2001 

Casarin 

et al.l 120131 |Pandolfi et al. 

1010 Martino et al.l 

20121 Craiu and Lemieuxl 


have been proposed in (Martino et al. 2015a). 


20071 and have been studied in several works (Bedard et al., 2012[ 


Martino 


and Read 2013 Martino et al., 2014). More recently parallel MTM algorithms 


A well-designed MTM scheme improves its performance as the number of 
tries, N, grows. Namely, when N grows approaching infinity, the correlation 
among the generated samples should vanish to zero. Clearly, this is at the 
expense of an increasing computational cost due to the use of a greater number 
of tries. In this work, we describe certain scenarios where the use of a greater 


in a standard MTM method ( 

Liu et al. 

200 

0) and its extensions ( 

Casarin 

et al. 

2013 

Pandolfi et al. 

2010| Martino et al. 

2012 

Martino and Read 

2013) 


does not yield an improvement in the performance. We explain the reasons of 
these drawbacks, and provide possible solutions for fixing these issues. The first 
scenario involves the use of a single random-walk proposal within a standard 
MTM structure, whereas, in the second scenario, the use of multiple proposal 
pdfs independent from the previous state of the chains is considered. In the 
first one, the increase of number of tries is always prejudicial, regardless of the 
choice of the weight functions (involving the target function in a suitable way 
(Liu et al. 2000| Martino and Read 20131). In the second one, the increase 


of number of tries can help the mixing of the chain using a certain class of 
the weight functions (clearly, at the expense of a greater computational cost). 
However, we discuss different ways of using the set of multiple independent 
proposal pdfs within an MTM scheme improving the performance, in any 
case. For improving the performance in the first scenario, we suggest to use 
an MTM with variable number of tries, in a suitable way without jeopardizing 
the ergodicity of the chain. 


2 Multiple Try Metropolis with a single random-walk proposal 


Let us denote the target density as 7r(x) cx 7r(x). First of all, we consider the 
use of a single random-walk proposal density, g(z|xt_i) = (/(z —Xt_i). Given a 
current state of the chain xt_i S A C ^ t g an MTM scheme generates 
N independent candidates {zi,..., zjv} from a proposal density q, i.e., 

zi, . .. ,ZAr ~ g(z|xt_i). 


weight functions (|Liu et al. 
from X( to z 


Then, one sample z is selected among the set {zi,..., zn}, according to certain 

DO Martino and Read [20l3 ). The movement 
is accepted with a suitable probability Q!(xt_i,z), which also 
depends on the rest of candidates. The probability Q!(xt_i,z) is designed such 
that the kernel of the MTM algorithm fulfills the detailed balance condition. 
Only for facilitating the comprehension, we consider the importance weights 


w(Zfe|xt_i) 


7r(zfc) 

q(zfe|xt_i) ’ 


( 4 ) 
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Table 1 Multiple Try Metropolis with a (single) random-walk proposal (RW-MTM). 


1. Draw N independent samples from the proposal pdf, 

Zi,... ,ZJV ~ g(x|xt_i) = q{z - Xt_i). 

2. Select a sample z S {zi,..., Z]v}, according to the probabilities 

■U)(zj.|xt_i) , , I ^ ’r(zfc) 

Wk = ^ - - where tu(zj.|xt_i) = 


E;Li«^(zn|x,_i) 


<?(Zfe|xt_l)’ 


( 1 ) 


for fc = 1,..., A^. 

3. Draw A^ — 1 auxiliary points from the proposal q given the previous selected sample 
z, namely yi,... ,y]v-i ~ ij(x|z), and set yjv = xt_i. 

4. Compute the weights of the auxiliary points, 


«^(yfc|z) = k = l,...,N. 

g(yfc|z) 


( 2 ) 


5. Set xt = z with probability 


a(xt-i, z) 


min 


eAi«>kix,_i) 

EAi«'(y-l^) 


Otherwise, set xt = xt_i, with probability 1 — Q;(xt_i,z). 


(3) 


for choosing z G {zi,...,ZAr}, i.e., z is selected according the probabilities 


Wk = 


m(Zfc|Xt i) 


Different kind of weights could be used (Martino and 


Read |2013[ [Pandolh et ah 2010), but without avoiding the problem that we 


describe in the next section. 

Table[^shows all the details of the MTM technique. Observe that, an RW- 
MTM method requires the generation of iV — 1 auxiliary points yi,..., yAr_i 
from q{-\z) (see Step 3 of Table[^. Moreover, note that the selected sample z 
is drawn from the empirical measure 


N 


riN) 


[z) = ^ WnS{z - Zn), 


(5) 


that approximates the distribution of tt, via importance sampling (IS) ( [Robert 
and Casella[ 2004, Liu 2004). Finally, we remark that the acceptance 
probability a(xt_i,z) in Eq. (|3) can be expressed as 


i!(xt_i,z) = min 


Z{zi ,... ,Zjv|xt_i) 

^(yi,---,yiv|z) 


( 6 ) 


where the function Z{-\r) : X" —?> M, with r G T, 


^(vi,... ,vjv|r) 


J_ 7r(v„) 


(7) 
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is an estimator of the normalizing constant Z = 7r(x)dx (Robert and 


Casella 2004), i.e., of the area below 7r(x). 


3 Problem in the RW-MTM mixing 


The desired behavior of an MTM scheme is that the performance improves as 
the number of used candidates N grows (jointly with the computational cost). 
Indeed in general, as N increases, the chosen point z is selected from a better 
IS approximation of tt, so that z is a better candidate to be tested as 
new possible state of the chain. As a consequence, in a well-designed MTM 
scheme the acceptance probability Q;(xt_i, z) should approach 1 when N —>■ oo. 
Thus, in general, MTM fosters greater “jumps” and, as a consequence, a faster 
exploration of the state space. However, below we describe a scenario where 
the increase of number N of tries could be even damaging. 

For facilitating the explanation, we assume that the expected value of 
the random variable Z ^ q(z — Xt_i) is exactly X(_i, i.e., A[Z] = Xt_i, 
e.g., when q is Gaussian, q(z — Xt_i} = A/’(z; Xt_i, C). Let us denote Zi = 
Z(zi,..., ZAr|xt_i) and Z 2 = Z(yi,..., yAr|z), so that we can rewrite the 
acceptance probability as 


a = min 



( 8 ) 


Furthermore, consider a scenario where the state in the (t — l)-th iteration, 
xt-i, is placed in a region of low probability of 7r(x) oc 7r(x), nearby a region of 
high probability mass (e.g., see Figure [^a)). Assume also that the variance of 
the proposal q(z — Xt-i) is wide enough in order to (at least) reach the region 
of high probability mass of tt. In this situation, several drawn tries are located 
in the region of small probability around the value E[Z] = Xt-i- On the other 
hand, it is possible that few of them are located close to the mode of tt; Figure 
[^a) depicts a possible scenario of this kind, with only = 4 tries and one 
of them located in a mode of tt. Thus, it is highly probable that the MTM 
selected one well-located point as proposed sample z, after the resampling at 
Step 2. For the same reasons, in general, many of the iV — 1 auxiliary points, 
yi,..., y^v-i drawn from q{y\z), will be placed around the mode of tt. Hence, 
in this situation, we have that 


7^(yn) 7 = — 

^;S9(y"l=2) ^ 9(Zn|xt-l)' 


As a consequence. 


a(xt_i,z) « 0, 


so that the chain can remain stuck at Xt-i- It is important to observe that 
this situation can become even worse if N grows. On the contrary, in this 
scenario, the use of a smaller number of tries can help to jump to the region 
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of high probability. Finally, we remark that the problem previously described 


cannot be solved by changing of analytical form of the weights (Liu et al. 

2013| )P1 


2000 


Martino and Read 



Fig. 1 Graphical representation of a possible scenario described in Sectionj^ where Z 2 > Zi 
(and Z 2 » Zi when N grows). We show the contour plot of a bidimensional target pdf 
7r(x) with solid lines. The previous state of the chain xt_i is depicted with a square; the 
= 4 candidates Zj’s are shown with circles, whereas the A" — 1 = 3 auxiliary points y^’s 
are illustrated with triangles. Dashed lines represent the scale parameters of the proposal 
densities g(-|xt_i) and g(-|z), where z E {zi,..., Z4} is the selected candidate. 


3.1 Proposed solution 


Let us denote as the kernel of an MTM scheme employing 

Nm tries. We consider a combination M different kernels each of which using 
a different number of tries m = 1,..., M, i.e., 


1 ^ 

Rr(xt|xt_i) = ^ X! ^m(xt|Xt_l,fV„^). 

m—1 


(9) 


It is straightforward to show that if each Rr„i(xt|xt_i,iV^) leaves invariant 
TT, also iL(x(|X(_i) has tt as invariant pdf ([Robert and Casella 2004 Liu 


20041. Therefore, fixing the averaged computational effort, represented by the 


averaged number or tries 


1 


M 


v = — V iv„ 

m—1 


^ A suitable acceptance function a for generic weight functions is shown in Appendix [a| 
for the case of multiple independent proposal densities. 
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we choose M different values Nm G N, such that N is the desired one. The 
idea is to use a variable number of tries, i.e., a different number of candidates 
at each iteration. Namely, at each iteration, an index m' is drawn uniformly 
within 1,... ,M and then Nm.' tries are employed in the MTM scheme Km'- 
Note that this is equivalent to use the kernel in Eq. ([^. Choosing at least one 
small value, e.g., Ni = 1, this helps jumps of the chain in the awkward scenario, 
previously described. See the numerical simulations for further details. 


4 Multiple Try Metropolis with different independent proposals 

The MTM algorithm in Table can be simplified if the proposal pdf g(x) is 
independent from the previous state of the generated chain. Indeed, in this 
case. Step 3 in Table can be removed, in the sense that it is possible 


to avoid the generation of the auxiliary points (Liu et al. 2000 Martino 


and Read 2013). Furthermore, it is also possible to employ simultaneously 


different proposal pdfs ( 7 i(x),..., ( 7 Ar(x) (Casarin et al. 2013 Martino and 


Read 2013). The resulting algorithm is detailed in Table considering the 


use of importance weights. The acceptance probability a in Eq. (12) can be 
written again as 


where, in this case, 


a = mm 


>'1 


N 




^2 = 




-WjXZj) +Wj(xt_i) . 


( 10 ) 


The general acceptance function a for I-MTM using generic (bounded and 
positive) weights is shown in Eq. (14). 


5 Problem in the I-MTM mixing 


First of all, we can observe that the sums in Zi and Z 2 in Eq. (10) differ only 


for one weight, i.e., Z\ contains Wj{zj) but does not involve Wj(xt_i), whereas 
Z 2 includes instead of Wj{zj). Thus, using importance weights, the 

probability a of an I-MTM scheme always approaches 1 when N increases, if 
the employed weigh t functions are included in the class of weights proposed 
in (Liu et al. 2000) ^This statement is instead not valid, in general, for the 


^ Considering the case of independent proposal pdfs, the class of weights in (Liu et al. 
|2000| l is defined as w^.{y^.\z) = 7r(zfe)gfc(x)As,(zfe,x) with k = 1,...,7V, and Afc(zj,,x) = 
Afe(x, Zfe) is a generic symmetric function w.r.t. zj, and x. As an example, if we set 
Afc(zfe,x) = , we obtain the importance weights iuj,(zfe|x) = Wk{zk) = 
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Table 2 Multiple Try Metropolis with different independent proposals (I-MTM). 

1. Draw N independent samples 

Zi ~ 91 (x), .. . ,ZJV ~ g]v(x). 

2. Select a sample zj G {zi,.. ., Ziv}, according to the probabilities 

_ 7r(z/-) 

^—7’ where Wk{zk) = 


En=l^n(Zn) 


9fc(zfe) ’ 


for fc = 1,..., A^. 

3. Set X£ = Zj with probability 

Q:(x£_i,Zj) = min 






Otherwise, set xt = X£_i, with probability 1 — Q;(x£_i,Zj). 


( 11 ) 


( 12 ) 


generic weight functions given in (Pandolfi et al. 2010 Martino and Read 


20131 and recalled in Eq. (14). 


In this section we focus on the use of importance weights, which are 
contained in class discussed in (Liu et al. 2000 1 . The solutions that we discuss 
later on are valid in any cases, including the use of the generic weights in 
Appendix P] Note that, in I-MTM, the j’-th weight involves the j-th proposal 
pdf, i.e.. 


Wj{x) 




We need to evaluate the j-th weight Wj , involving the j-th proposal qj , at Zj 
and Xt_i. The sample Zj is drawn from qj by definition, whereas Xt_i is the 
previous state of the chain (it could be generated from any possible (?„ in the 
previous iterations of the I-MTM algorithm). Hence, with high probability Zj 
is located nearby a mode of qj, since zj ~ qj{z), whereas Xt_i could be placed 
close to a mode or a tail of qj with equal chance, in general. Thus, since the 
proposal qj appears in the denominator of the weights Wj , in general we have 
Wj{zj) < Wj(jx.t-i), producing small values of acceptance probability a, if N is 
not enough big. This scenario becomes even more complicated, if the proposal 
pdf qj is placed close to a mode of the target tt, and the previous state X(_i is 
located in a tail of qj. In this case, if 7 r(xt_i) 7 ^ 0, the value of u>j(X(_i) can 
be huge and Wj{xt_i) » Wj{zj). Hence, the I-MTM scheme tends to select 
several times the sample drawn from qj, i.e., Zj, as “good” candidate (step 2 
of Table [^, but the movement from X(_i to Zj is often rejected since a « 0. 
As a consequence, the chain can remain indefinitely trapped in this situation. 
Figure [^represents graphical sketch of this situation. 
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Fig. 2 Graphical representation of the scenario described in Section The contour plot 
of a bimodal (unnormalized) target pdf 7r{x) is depicted with solid line whereas the j-th 
(unnormalized) proposal pdf gj(x) is shown with dashed line. 

5.1 Proposed solutions 

Below, we discuss different possible solutions, ordered for increasing theoretical 
complexity and practical interest. It is important to remark that the change 
of the analytic form of the weights is not a solution as shown in 
First solution. First of all, let us consider the possibility of using a greater 
number of tries keeping fixed the number N of proposal pdfs, i.e., denoting 
with P the number of tries we have P > N with P = kN with k G N. 
The problem described above could be solved increasing P, when the used 
weights are importance weightsj^If Xt_i is located in a tail of qj, the value 
of P required to solve the issue, could be huge. However, this trivial solution 
entails an increase of the computational cost in terms of evaluations of the 
target function. In the sequel, we introduce alternative solutions which do not 
require to increase the computational cost and are valid for any possible kind 
of weight functions, used within I-MTM. 

Second solution. The problem described above disappears if we consider a 
unique proposal pdf defined as mixture, i.e.. 



n—1 


Hence, in this case, we draw zi,..., zn from V'(x) and the weights are 



3 


When other kind of weights is employed, the problem could persist even increasing P. 
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We can observe that in the denominator of the importance weight all the 
components g„’s are used and hence evaluated, in this case. Let us assume that 
the previous state of the chain X(_i was generated from the fc-th component 
of the mixture, i.e., (?fe(x), in a previous iteration, and the selected candidate 
Zj has been drawn from gj(x), by definition. In this scenario, both pdfs, qk 
and Qj, are involved simultaneously in the denominator of importance weights, 
avoiding the problem previously described. Although the mixture '(/'(x) takes 
into account all the proposal pdfs ( 7 „’s, unlike in the I-MTM in Table in 
this case only a subset of the components {( 7 i(x),..., gjv(x)} participates in 
generating candidates at each iteration. To avoid this drawback, see below the 
next solution. 

Third solution. The joint use of the functions gi(x),..., (j'Af(x) (with equal 
proportion, at each iteration) in general increases the robustness of the 
resulting algorithm. Namely, if no information is available to choose the best 
proposal in the set {gi(x),..., g 7 v(x)}, a more robust strategy consists in 
employing always the complete set of functions. The deterministic mixture 


2000 Elvira 


et al-l 2015a|b 

), successfully applied in different sophisticated Monte Carlo 

algorithms (|Cornuet et al. 

2012 

Martino et al. 

2015b|c 

), provides a possible 


solution. Indeed, using the DM approach, we can draw one sample z„ 
each proposal pdf 9 n(x), i.e., 


from 


Zl 


gi(x),...,ZAr ~ gAr(x), 


exactly as in step 1 of Table and then assign the corresponding DM weights 
7r(z„) 7r(z„) 


w(z„) = 




n = 1, 


,iV. 


It is possible to show that this approach is valid and it can be interpreted as 
variance reduction technique for sampling from a mixture of pdfs. Namely, we 
use a quasi-Monte Carlo approach for generating the indices jn, n = 1,..., N, 
i.e., the deterministic sequence ji = 1,^2 = = N, and then 

z„ ~ p(x|j„) = (7n(x) for n = I,...,A^. The DM approach improves the 


performance of the IS numerical approximation (Owen and Zhou 2000 Elvira 


et al.[ 2015a). Observe that, also in this case, we solve the issue, since again all 


the proposals are included in the denominator of the weights, and we always 
use all the proposals qi,... ,qN at each iteration (as in Table [^. 


6 Numerical simulations: localization in a wireless sensor network 


We consider the problem of positioning a target X in a two-dimensional space 
using range measurements ( Ali et ah] 2007 Fitzgerald 2001). More formally, 
we consider a random vector X = [Ai,A 2 ]' denoting the target’s position in 
The measurements are obtained from 6 sensors located at hi = [—5,1]^, 
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h 2 = [-2,6]^, ha = [0,0]^, h 4 = [5,-6]^, hs = [6,4]^ and he = [-4,-4]^, 
and the observation equations are given by 

i?,=-101og(^fc^) j = l,...,6, (13) 

where Qj are i.i.d. Gaussian random variables, fij ^ Let us 

assume to receive the observation vector r = [26,26.5,25,28,28,25.3]^. In 
order to perform Bayesian inference, we consider a non-informative prior over 
X (i.e., an improper uniform density on K^), and study the posterior pdf, 
7r(x) = p(x|r) oc p(r|x)p(x). A contour plot of 7r(x) oc 7r(x) is shown in Figure 

m 

We perform different MTM schemes for drawing samples from the posterior 
7 r(x). In order to highlight the described issues, we decide the starting point 
of the chain at Xq = [-6 ,-6]^ forcing the chain to escape from a region 
of low probability of 7r(x). We run 500 independent simulations of different 
MTM schemes with t = 1,...,T (we set T = 2000 for RW-MTM and 
T = 4000 for I-MTM), and compute the expected time needed for the chain 
to escape from the region around Xq and reach the region containing the 
modes of the target. For this purpose, at each iteration of the algorithm, 
we calculate the Euclidean distances di^t = H^t — Xojj and d 2 ,t = ||xt — /xjj 
where /x = E 7 r[X] = [—0.753, —0.037]^ is the expected value of X ^ 7r(x)j^At 
each run, we obtain the first iteration r* such that di^r* > d 2 .T*, hence r* can 
be interpreted as the time that the chain remained trapped around Xq, in the 
specific run ( see Figures]^ as examples of r*). Cleary, we have 1 < t* < T. 
We repeat the procedure for 500 independent runs, in order to approximate 
the expected time E[t*]. 

RW-MTM. For the random walk MTM method, we consider a Gaussian 
proposal ( 7 (x|xt_i) = A/’(x; Xt_i, X) where S = a‘^l 2 with a G {0.5,0.8,1}. 
We test different averaged number of tries N G {50,100, 200, 500,1000}. Thus, 
in the standard RW-MTM scheme, we set N = N, whereas in the proposed 
mixture of MTM kernels in Eq. ([^, we consider M = 3 and 7Vi = 1, A ^2 = Ef, 
N 3 = 2N — 1, so that we have always 

~ _ Ni + N2 + 

3 

Therefore, the averaged computational cost is the same in both schemes, 
in terms of evaluations of the target distribution. The results, in terms of 
the expected number of iterations E[t*], are provided in Table First of 
all, observe that, in general, E[t*] grows if the number of tries N increases 
especially for the standard RW-MTM method (recall that for the standard 
RW- MTM scheme N = N). The expected number of iterations E[t*] of the 
novel MTM technique with variable number of tries (introduced in Section 

We have computed the vector Ett [X] numerically, using a computational expensive thin 
grid in 
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3.1) is always smaller than the corresponding value of the standard RW- 


MTM method. Namely, the novel scheme always outperforms the standard 
one, escaping from the region around Xq and reaching the modes of 7r(x) 
more quickly, whereas the standard RW-MTM method remains stuck around 
Xq for several iterations, prejudicing its performance. Figures shows the 
improvement in the mixing with the proposed solution with respect to the 
standard RW-MTM technique. 

Furthermore, the Mean Square Error (MSE) in the estimation of 
obtained by RW-MTM (and averaged over 500 runs) is provided in Table 
1^ In this case, we set cr = 1 and the initial state is chosen randomly 
xg ^ Z^([— 6,6] X [—6,6]) (i.e., uniformly in the square ([—6,6] x [—6,6]), at 
each run. We can observe that the novel scheme provides always the smallest 
MSE confirming the robustness of the proposed solution. 

I-MTM. For the I-MTM scheme, we consider N = 2 proposal pdfs and 
also P = N = 2 number of tries (exactly as in the algorithm described in 
Table [^. Furthermore, the proposal pdfs are both Gaussians, specifically, 
qnix.) = A/'(x;/x„, X), for n = 1,2 and = [-6,-6]^, fi 2 = [0,0]^ 
in the first configuration (denoted as Confl), and /xi = [-6,-6]^, /X 2 = 
[-1,-2]^ in a second one (denoted as Conf2). Thus, the second proposal 
pdf is always well-located, unlike the first one. The covariance matrix is the 
same for both proposals, S = cr’^Jl 2 , and we test several values of a,, i.e., 
a € {1.25,1.3,1.35,1.4}. As alternative scheme we consider the use of the 
deterministic mixture approach proposed in Section |5.1[ We compute again 
the expected number of iterations E[t*] for reaching the modes starting from 
Xq = [-6 ,-6]^ and set T = 4000 as length of the chain, in this case. The 
results are provided in Table We can observe that with the deterministic 
mixture approach the chain is able to jump easily to the regions of high 
probability of tt, unlike with the standard I-MTM scheme. This occurs for 
every value of cr. With the standard I-MTM scheme the chain remains trapped 
around Xg for several iterations jeopardizing the performance of the algorithm 
(see also Table |^. 

The MSE values given in Table (and averaged over 500 runs) show that 
the improvement obtained by the novel scheme is even more evident than in 
the RW-MTM case. We have considered Conf2 and the initial state is chosen 
randomly Xg ^ Z//([—6, 6] x [—6, 6]) at each run. 


7 Conclusions 

In this work, we have described different scenarios where MTM schemes 
have not the desired behavior, preventing the fast exploration of the state 
space. These drawbacks cannot be solved simply increasing the computational 
effort, in terms of used number of tries. We have restricted the description 
of the problematic cases considering only the importance weights for the 
sake of simplicity, but the issues persist with other generic weight functions. 
Furthermore, we provide and discuss different solutions that solved the 
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Table 3 Expected number of iterations -E’[r*] required to escape from the region around 
xo = [-6,-6]"^ with RW-MTM. 


Scheme 

a 

Af = 50 

N = 100 

N = 200 

N = 500 

N = 1000 

standard 

novel 

0.5 

101.922 

67.237 

165.320 

72.349 

276.454 

81.253 

431.606 

92.798 

601.050 

88.444 

standard 

novel 

0.8 

205.299 

49.711 

367.358 

51.557 

612.442 

49.405 

1098.5 

49.706 

1363.1 

56.145 

standard 

novel 

1 

237.326 

43.436 

443.080 

41.236 

709.808 

33.906 

784.644 

37.812 

699.614 

39.270 


Table 4 MSE in the estimation of Et^[X.], obtained by RW-MTM, with a = 1 and 
xq ~ ^/([—6, 6] X [—6,6]), i.e., randomly chosen at each run. The standard and the novel 
scheme are test with different (fixed or averaged) number of tries N. 


Scheme 

o 

II 

N = 100 

N = 200 

N = 500 

N = 1000 

standard 

novel 

0.1702 

0.0533 

0.1193 

0.0428 

0.0892 

0.0329 

0.0542 

0.0320 

0.0266 

0.0228 


Table 5 Expected number of iterations Efr*] required to escape from the region around 
XQ = [—6, —6]^ with I-MTM. 


Scheme 

Conf 

cr = 1.25 

a = 1.3 

a = 1.35 

a = 1.4 

standard 

novel 

1 

2967.6 

7.338 

1185.6 

10.198 

128.102 

13.652 

15.610 

10.834 

standard 

novel 

2 

3015.6 

10.130 

1212.9 

20.454 

139.816 

6.989 

20.548 

15.920 


Table 6 MSE in the estimation of E^TrfX], obtained by I-MTM, with Conf2 and xq 
^/([—6,6] X [—6,6]), i.e., randomly chosen at each run. 


Scheme 

cr = 1.25 

a = 1.3 

cr = 1.35 

a = 1.4 

standard 

novel 

6.7943 

0.7677 

6.4345 

0.6987 

5.9183 

0.3135 

5.5595 

0.3055 


previously described problems, as also shown with numerical simulations. The 
proposed MTM schemes are in general more robust than the corresponding 
standard MTM techniques. 
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A Alternative weights in I-MTM 


Other possible weight functions can be employed within MTM schemes without jeopardizing 
the ergodicity of the Markov chain. Let us consider the I-MTM scheme in Table [fusing a 
generic weight function bounded and positive, i.e., iCn(x) > 0, for all n. In this case, 

we have also to assume 7r(x) > 0, for all x ^ JY. As shown in < |Martino and Read| [20T^ 
|Pandolfi et al.||2010|l, the adequate probability for accepting the jump from xt_i to z, in 
this case is 

Wx' 


a(xt-i,Zj) = min 


where 


IFz = 


Wj{zj) 




Wx = 




Ulj(xt l) 


(14) 


[E^Li «'n(z„)] - Wj(zj) + Wj(xt-l) 


If the chosen weights are the importance weights, w„(x) = then Eq. coincides 

with Eq. EH- Moreover, note that, in any case, 0 < Wz ^ 1 and 0 < Wx ^ 1- As 
explained in Sectionj^ in general, it often occurs that qj{zj) > gj(xt_i) since Zj ~ 9j(z) 

whereas xt_i has been generated from a generic with k G {1,..., TV}. Thus, 
tends to be close to zero and as consequence often a 0, regardless of the choice of the 
weight functions. Observe that if we employ t he se t of proposal pdfs gj(x)’s as a mixture 
^ as suggested in Section 5.1 the problem is solved also in this case. 















